Skip to content

feat: added configurable num of threads for xnnpack to fix android performance#534

Merged
mkopcins merged 7 commits intomainfrom
@mkopcins/android-perf
Aug 27, 2025
Merged

feat: added configurable num of threads for xnnpack to fix android performance#534
mkopcins merged 7 commits intomainfrom
@mkopcins/android-perf

Conversation

@mkopcins
Copy link
Copy Markdown
Collaborator

Description

After migrating llms to cpp architecture we saw significant drop in performance (up to 10x slower). After debugging the main culprit turned out to be the number of threads spawned for xnnpack threadpool, which defaulted to the number of cores (the underlying reason is still unknown).

Introduces a breaking change?

  • Yes - migrating back to cpp architecture for llms
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@mkopcins mkopcins force-pushed the @mkopcins/android-perf branch from 24b8943 to 7f8c7f1 Compare August 25, 2025 09:21
@mkopcins mkopcins requested a review from chmjkb August 25, 2025 09:21
@mkopcins mkopcins marked this pull request as ready for review August 25, 2025 09:25
Comment thread packages/react-native-executorch/common/rnexecutorch/RnExecutorchInstaller.cpp Outdated
Comment on lines +4 to +9
<dict>
<key>com.apple.developer.kernel.increased-debugging-memory-limit</key>
<true/>
<key>com.apple.developer.kernel.increased-memory-limit</key>
<true/>
</dict>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we truly need this, the app works without this. And XCode won't let me run the app when this is specified.

Comment thread packages/react-native-executorch/android/libs/arm64-v8a/libexecutorch.so Outdated
Comment thread packages/react-native-executorch/android/libs/x86_64/libexecutorch.so Outdated
Comment on lines +45 to +46
if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
target_compile_definitions(react-native-executorch PRIVATE ARCH_ARM64)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use
if(ANDROID_ABI STREQUAL "arm64-v8a") instead, as it is done in the following?

@mkopcins mkopcins merged commit 237aba1 into main Aug 27, 2025
3 checks passed
@mkopcins mkopcins deleted the @mkopcins/android-perf branch August 27, 2025 11:09
mkopcins added a commit that referenced this pull request Sep 2, 2025
…rformance (#534)

## Description

After migrating llms to cpp architecture we saw significant drop in
performance (up to 10x slower). After debugging the main culprit turned
out to be the number of threads spawned for xnnpack threadpool, which
defaulted to the number of cores (the underlying reason is still
unknown).

### Introduces a breaking change?

- [x] Yes - migrating back to cpp architecture for llms
- [ ] No

### Type of change

- [x] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [x] iOS
- [x] Android

### Testing instructions

<!-- Provide step-by-step instructions on how to test your changes.
Include setup details if necessary. -->

### Screenshots

<!-- Add screenshots here, if applicable -->

### Related issues

<!-- Link related issues here using #issue-number -->

### Checklist

- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [ ] My changes generate no new warnings

### Additional notes

<!-- Include any additional information, assumptions, or context that
reviewers might need to understand this PR. -->

---------

Co-authored-by: Mateusz Kopciński <mateusz.kopcinski@swmansnion.com>
KnextKoder pushed a commit to Synkhiv/react-native-executorch that referenced this pull request Nov 7, 2025
…rformance (software-mansion#534)

## Description

After migrating llms to cpp architecture we saw significant drop in
performance (up to 10x slower). After debugging the main culprit turned
out to be the number of threads spawned for xnnpack threadpool, which
defaulted to the number of cores (the underlying reason is still
unknown).

### Introduces a breaking change?

- [x] Yes - migrating back to cpp architecture for llms
- [ ] No

### Type of change

- [x] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [x] iOS
- [x] Android

### Testing instructions

<!-- Provide step-by-step instructions on how to test your changes.
Include setup details if necessary. -->

### Screenshots

<!-- Add screenshots here, if applicable -->

### Related issues

<!-- Link related issues here using #issue-number -->

### Checklist

- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [ ] My changes generate no new warnings

### Additional notes

<!-- Include any additional information, assumptions, or context that
reviewers might need to understand this PR. -->

---------

Co-authored-by: Mateusz Kopciński <mateusz.kopcinski@swmansnion.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants